Statistics for Corpus Linguists
  • Overview
  • Fundamentals
    • 1.1 Basics
    • 1.2 Linguistic variables
    • 1.3 Research questions
    • 1.4 Set theory and mathematical notation
  • Introduction to R
    • 2.1 First steps
    • 2.2 Exploring R Studio
    • 2.3 Vectors
    • 2.4 Data frames
    • 2.5 Libraries
    • 2.6 Importing/Exporting
  • NLP
    • 3.1 Concordancing
    • 3.2 Regular expressions
    • 3.3 The CQP interface
    • 3.4 Data annotation
  • Statistics
    • 4.1 Data, variables, samples
    • 4.2 Probability theory
    • 4.3 Descriptive statistics
    • 4.4 Hypothesis testing
    • 4.5 Chi-squared test
    • 4.6 t-test
  • Models
    • 6.1 Linear regression
    • 6.2 Logistic regression
    • 6.3 Mixed-effects regression
    • 6.4 Poisson regression
    • 6.5 Ordinal regression
  • Machine Learning
    • 7.1 Tree-based methods
    • 7.2 Gradient boosting
    • 7.3 PCA
    • 7.4 EFA
    • 7.5 Clustering
  1. 1. Fundamentals
  2. 1.3 Linguistic Variables
  • 1. Fundamentals
    • 1.1 Basics
    • 1.2 Research Questions
    • 1.3 Linguistic Variables
    • 1.4 Formal aspects

On this page

  • What is a linguistic variable?
  • Subtypes of variables
    • Linguistic perspective
    • Sociolinguistic perspective
  • Many morphosyntactic variables in English
  • Exercises
  1. 1. Fundamentals
  2. 1.3 Linguistic Variables

1.2 Linguistic variables

Theory
Authors
Affiliation

Thomas Brunner

Catholic University of Eichstätt-Ingolstadt

Vladimir Buskin

Catholic University of Eichstätt-Ingolstadt

Abstract
This handout introduces linguistic variables from classical and sociolinguistic perspectives, explores their subtypes and salience, discusses the principle of accountability, and provides examples of morphosyntactic variation in English.

What is a linguistic variable?

  1. The classical view: Labov (1972) defines a linguistic variable as “two ways of saying the same thing.”

Labov (1978: 7-8) explains the linguistic variable should ideally be salient on several levels:

  • It should should occur often enough that small data samples could already hint at its distributional idiosyncracies.
  • It should be at the centre of the linguistic system
  • Previous research should hint at the influence of extra-linguistic factors.
  • On the level of the individual speaker, a limited awareness of the variable realisations is desirable, but should not be too high, lest it skew the nature of their linguistic output.
  1. A restriction: Meyerhoff (2009: 11) summarises: “In sum, a sociolinguistic variable can be defined as a linguistic variable that is constrained by social or non-linguistic factors […]”

  2. A more open view: Kiesling (2011) argued, “Given the variability of what counts as a variable, we must define what counts as a variable more broadly than ‘two or more ways of saying the same thing’. We will simply say that a linguistic variable is a choice or option about speaking in a speech community… Note that this definition does not in any way require us to state that the meaning be the same, although there should be some kind of equivalence noted.”

Subtypes of variables

Linguistic perspective

  1. Phonetic/phonological
  2. Morphological
  3. Syntactic
  4. Pragmatic

Sociolinguistic perspective

Sociolinguistic variables also differ with regard to their salience in society.

  1. Stereotypes are strongly socially marked and part of popular discourse about language.
    • h-dropping in Cockney
    • Canadian eh at the end of sentences
    • Australian dinkum: I was fair dinkum about my interest in their culture ‘authentic, genuine’
  2. Markers show both social and style stratification; all members of a society react similarly in taking care to avoid the pattern in formal registers.
    • (r)

    • (th)

  3. Indicators differentiate social groups. However, people are not aware of them and therefore do not avoid them in formal registers.
    • Same vowel in God and Guard in New York City

Cf. Mesthrie (2011).

Many morphosyntactic variables in English

Variable Example
Indefinite Pronouns everybody vs. everyone
Case and order of coordinated pronouns my husband and I vs. my husband and me vs. me and my husband
that vs. zero complementation I don’t think that/Ø it’s a problem.
that vs. gerundial complementation remember that vs. remember V-ing; try to vs. try and vs. try V-ing
Particle placement set the computer up vs. set up the computer
The dative alternation give the book to John vs. give John the book
The genitive alternation John’s house vs. the house of John
Relativization strategies wh-word vs. that vs. Ø
Analytic vs. synthetic comparatives warmer vs. more scary
Plural existentials there are some places vs. there’s some places
Future temporal reference will vs. going to vs. progressive etc.
Deontic modality must vs. have to vs. need to vs. got to etc.
Stative possession have vs. have got vs. got
Quotatives say vs. be like vs. go etc.
not vs. no not anybody vs. nobody; not anyone vs. no one; not anything vs. nothing
NOT vs. AUX contraction that’s not vs. that isn’t etc.

Cf. Gardner et al. (2021).

Exercises

Exercise 1 Which of the following variables could be considered ‘good’ sociolinguistic variables, and which of them poor ones? Justify your answer.

  1. /fɔːθ flɔː/ vs. /fɔːrθ flɔːr/
  2. This enables him to preside over the process which I have described vs. This enables him to preside over the process that I have described vs. This enables him to preside over the process ∅ I have described.
  3. The pair found the briefcase on a bus station bench at Bath central bus station. vs. The briefcase was found on a bus station bench at Bath central bus station by the pair.
  4. Art is after all the subject of attention for both critic and historian, even though the functions and methods of the two sorts of writer have drawn apart. vs. Art histories often make an attempt to keep to chronology, although the difficulties include the crucial fact that in art there is no clear sequence of events. vs. Many of his readers approved his sensitive and appreciative understanding of paintings, though without sharing his political views.
  5. /pleɪŋ/ vs. /pleɪn/
  6. [tʰ] in /tɔp/ vs. [t] in stop.

Exercise 2 Two linguists aim to study the preference for passives among men and women. They extract all the passives from 500,000 words of male speech and all passives from 500,000 words of female speech and report the results. What is wrong with this approach?

References

Gardner, Matt Hunt et al. 2021. “Variation Isn’t That Hard: Morphosyntactic Choice Does Not Predict Production Difficulty.” PloS One 16 (6): e0252602–2.
Kiesling, Scott F. 2011. Linguistic Variation and Change. Edinburgh: Edinburgh University Press.
Labov, William. 1972. Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press.
———. 1978. Sociolinguistic Patterns. Oxford: Blackwell.
Mesthrie, Rajend. 2011. Introducing Sociolinguistics. 2nd ed. Edinburgh: Edinburgh University Press.
Meyerhoff, Miriam. 2009. Introducing Sociolinguistics. London: Routledge.
1.2 Research Questions
1.4 Formal aspects